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Directed graphs (DG), interpreted as state transition diagrams, are traditionally used to represent 
finite-state automata (FSA). In the context of formal languages, both FSA and regular expressions 
(RE) are equivalent in that they accept and generate, respectively, type-3 (regular) languages. Based 
on our previous work, this paper analyzes effects of graph manipulations on corresponding RE. In 
this present, starting stage we assume that the DG under consideration contains no cycles. Graph 
manipulation is performed by deleting or inserting of nodes or arcs. Combined and/or multiple ap- 
plication of these basic operators enable a great variety of transformations of DG (and corresponding 
RE) that can be seen as mutants of the original DG (and corresponding RE). DG are popular for 
modeling complex systems; however they easily become intractable if the system under considera- 
tion is complex and/or large. In such situations, we propose to switch to corresponding RE in order 
to benefit from their compact format for modeling and algebraic operations for analysis. The results 
of the study are of great potential interest to mutation testing. 

1 Introduction and related work 

Most of model-based testing techniques operate on graphs, especially on directed graphs (DG). This 
has been masterly expressed by one of the testing pioneers, Beizer, as "Find a graph and cover it!" 
CD 121. The basic idea behind "graph coverage" entails generation of test cases and the selection of a 
minimum number of them, called "test suite", in order to cost-effectively exercise a given set of structural 
or functional issues of the software under test (SUT). A good test coverage increases user confidence 
in software artifacts, showing that the software is doing everything as it is supposed to do {positive 
testing, H). 

For implementation-oriented, white-box testing, nodes of the DG to be covered usually represent the 
statements of SUT; arcs represent the sequences of those statements [10]. For specification-oriented, 
black-box testing, nodes of the DG may represent the behavioral events of SUT; arcs represent the se- 
quences of those events 0. 

When using a graph to model of SUT, Belli et al. propose not only to cover the DG model given, but 
also its complement, showing that the software is not doing anything it is not supposed to do {negative 
testing, (21 @1). For this, the authors propose specific manipulation operators of the graph that models 
SUT. Negative testing approach can be seen in relationship with mutation testing, which is originally a 
white-box test technique [7 ]. Recently, Belli et al. proposed to extend mutation-testing approach to black 
box, model-based testing 0. 

A tough problem with complex SUTs is that modeling graphs rapidly become large and thus tedious 
to work with. If the modeling DG can be interpreted as the transition diagram of a finite-state automaton 
(FSA), it might be helpful to transform the modeling DG into an algebraic format, i. e., regular expres- 
sions (RE), and work with this compact formulae instead of spacious graphs (also see |fl9l ). Thereby, 
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well-known algorithms can be used to solve the problems concerning the transformation from DG to RE, 
and v. v. (9l[Tll[T2l. In order to extract the RE from a given DG, one may follow the steps given below: 

• Convert DG to deterministic FSA (by interpreting the DG as a Moore Machine [15] and FSA as a 
Mealy Machine lfl4lD . 

• Convert the FSA to RE by using the widely known algorithms in the literature (also see [11]). 
In addition, for the opposite chain of transformations, the following steps can be used: 

• Convert RE to non-deterministic FSA (also see Ifl3l l8ll). 

• Convert non-deterministic FSA to a deterministic FSA (and minimize). 

• Convert the FSA to DG (similar to Mealy - Moore conversion). 

Application of the basic operators, as introduced in Q, to a DG transforms it to another DG, which 
likely corresponds to a different RE than the original one. Contrarily, the corresponding DG of a ma- 
nipulated RE differs from the DG that corresponds to the original RE. One of the main objectives of our 
research is to take the initial steps in order to increase the efficiency of mutation testing by determining, 
if possible, correspondences between DG and RE modifications. In testing literature, there are many 
varied constructs, such as DG, FSA, EFSA (extended FSA), ESG and state charts etc., which are used 
to model a SUT. Each of these graph-based representations possesses different syntax and semantics. In 
fact, in many cases they are presented as an extension of one another. The common arguments which can 
be drawn on these structures are (1) they all have (extended) RE counterparts, and (2) the more complex 
the SUT gets the harder they are to work with in their graphic format. lfl6l [T8l El 

To our knowledge there is no approach which aims to manipulate the corresponding RE in order to 
reflect alterations of the mutation operators performed on the given DG, or v. v. However, it is worth 
mentioning that there are several works on the algebra of RE which enables the transformations via some 
defined system of rules, such as EUl l6l [T71. Taken this into account and based on DG and RE, the 
next section introduces the notions used in this paper, defines basic operators for graph manipulations 
and finally introduces the "sum of products" format for canonic representation of regular expressions. 
Section [3] applies those basic operators to DG and algorithmically generates their corresponding RE. 
Complexity of these algorithms are determined (see also the Appendix), before Section [4] concludes the 
paper with a summary of results already achieved and research work planned. 

2 Notions used 

This section briefly and semi-formally summarizes notions we need to launch the discussion in Section[3] 
2.1 Directed graphs and regular expressions 

Definition 1. A directed graph (DG) is the tuple (V,A) where V is a finite set of nodes, i.e., 
V = {vi,. . . ,v n }, and A is a finite set of directed arcs which are ordered pairs of elements of V, i. e., 
A = {oi , . . . ,a m } C V X V, where each <ij = (vj,Vk) for some j, k. 

Definition 2. A regular expression (RE) consists of symbols of an alphabet and is used to express a 
set of strings (or words), i. e., a language. In an operational perspective, a RE can be assumed to be a 
sequence of symbols a, b, c, . . . of an alphabet which can be connected by operations 

• sequence (".", but usually no explicit operation symbol, e. g., "aft" means "6 follows a"), 
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a(bd + c(d + ef))gh{ilm+jk(lm + no))pq 



Figure 1 : A sample DG and its corresponding RE 

• selection ("+", e. g. "a + b" means "a or b"), 

• iteration ("*", Kleene's Star Operation, e. g., 

- "a*" means "a will be repeated arbitrarily"; 

- "a + " means at least one occurrence of "a"). 

These operations are also applied on RE other than simple symbols and, as usual, parenthesization is 
used to guarantee the intended precedence and associativity. 

A sample DG and its corresponding RE are given in Figured] In order to define a RE representation 
of a DG, we need to distinguish some nodes as start nodes and some others as finish nodes. In this 
context, the set of nodes is considered as the alphabet (the set of symbols) and the words (strings) in the 
language expressed by the RE are, in fact, the node sequences forming paths connecting start nodes to 
finish nodes in the graph. This convention has been introduced in |2] to define event sequence graphs 
(ESG). 

2.2 Operators for manipulation of directed graphs 

For manipulation of a graph, or a DG, elementary operations can be classified under two categories, 
insertion (i) and omission (o), and since a DG consists of nodes and edges, the manipulation operators 
can be specified as node insertion (i n ), node omission (o n ), arc insertion (i a ) and arc omission (o a ) 
operators. 

Definition 3. DG manipulation operators transform a DG to another DG and defined as follows: 

• Arc insertion operator adds a new arc (vj,v k ), where Vj,v k € V, to the DG (V,A): 

(v jt v k )i a : (V,A) (V,AU{(v jt v k )}). 

• Arc omission operator deletes an arc (vj,v k ), where Vj,v k S V, from the DG (V,A): 

(vj,v k )o a : (V,A) (V,A\{( Vj ,v k )}). 
It is possible that some nodes are left with no ingoing and/or outgoing arcs. 

• Node insertion operator adds a new node v £ V to the DG (V, A) together with possibly nonzero 
number of arcs {a\ , . . . , a k } connecting this node to the remaining nodes: 

(v,a u ...,a k )i n ■ (V,A) -» (V Uv,AU {a u ■ ■ ■ ,a k }). 

• Node omission operator deletes a node v G V from the DG (V,A) together with all the arcs 
a\, . . . ,a k A ingoing to and outgoing from the deleted node: 

(v)o n : (V,A) -> (V\{v},A\{a u ...,a k }). 

Figure |2] results from the application of basic manipulation operators to the DG in Figure [TJ 
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a(bd + bdf + cef)gh(i+jk)lmpq + opq 



Figure 2: The manipulated DG and RE by application of (cd)o a (df)i a (n)o n 
2.3 Sum of products format for RE and auxiliary functions 

In order to carry out transformation on the RE in an algorithmic way, we introduce, in analogy to Boolean 
Algebra, a canonical representation for RE under consideration and some auxiliary functions which 
operate on RE. 

Definition 4. A given RE is in the sum of products format (SOPF), if it is represented as the sum of 
finitely many product terms, each of which is in one of the following forms: 

• r 

• R* 

• Finite concatenations of r and/or R* (such as rR* , R*r and rR*rR*, etc.) 

where r is an arbitrary finite string (formed by only the concatenation of symbols) and R is a RE in SOPF. 
Note that the SOPF is a very simple and straightforward format which highly disregards the compactness. 

Example 5. Let the RE in Figure[T]be R, then SOPF of R is given as below 

SOPF(R) = abdghilmpq + abdghjklmpq + abdghjknopq + 
acdghilmpq + acdghjklmpq + acdghjknopq + 
acefghilmpq + acefghjklmpq + acef ghjknopq. o 

Definition 6. Let D be the DG with the set of vertices {v\ ,...,v n } and R be the corresponding RE, then 
we define: 

• pt(R, s) to be the set of product terms which contain the string s, 

• ht(P, s) to be the set of product terms which are the beginning (head) subterms, ending with the 
first occurrence of the string s, of the product terms in the set P, and 

• tt(P, s) to be the set of product terms which are the ending (tail) subterms, beginning with the last 
occurrence of the string s, of the product terms in the set P. 

Example 7. Let SOPF of the RE in Figure[T]be R, then we have 

• pt(R,jkl) = {abdghjklmpq, acdghjklmpq, acef ghjklmpq}, 

• ht(P, f) = {acef}, and 

• tt(P,gh) = {ghilmpq, ghjklmpq, ghjknopq}. o 
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3 Approach: graph manipulation and effects on corresponding regular 
expression 

As basically discussed in the introduction section, the problem, in its general form, is to expose the 
underlying correspondences between DG and RE manipulations. More precisely, we have following 
situation: Given a DG and the corresponding RE, we want to reflect the result of DG transformations 
stemming from applications of basic manipulation operators to the corresponding RE, and v. v. For this 
purpose, we assume that (1) the initial and transformed DG have no cycles and (2) all the RE are in the 
SOPF. Furthermore, the analysis of the effects of RE manipulation on corresponding DG, is postponed; 
in this stage we just focus on the effects of manipulating DG on its RE. 

Under the assumptions stated above, the following subsections outline straightforward algorithms for 
basic manipulation of DG by transforming its corresponding RE, and, in the discussion ahead, \P\ and 
\p\ are defined to be upper bounds on the number of product terms and on the lengths of the product terms 
in the given RE. Complexity values of the auxiliary algorithms are included in Appendix and Table [JJ 
they are necessary for the validation of the worst case time complexity results of the DG manipulation 
algorithms which are analyzed in the next subsections. 

3.1 Arc operators 

Following, omission and insertion operations are applied to arcs. 
3.1.1 Arc insertion 

Algorithm [JJ outlines the addition of new paths connecting start and finish nodes in the DG as product 
terms to the given RE, during the insertion of the arc (vi,Vj), where Vi,Vj E V, to the DG. During the 
insertion, no product term in the RE should contain the symbol Vj before Vi. Otherwise, the operation 
produces a cycle. 

Input: R - a RE in SOPF, 

(vi,Vj), where Vi,Vj £ V - arc to be inserted 
Output: R - RE is updated with the insertion of the arc (vi , Vj ) in SOPF 

A = pt(R, II Find the set of product terms containing Uj 
B = pt(R, Vj ) // Find the set of product terms containing Vj 
A 1 = ht(A, Vi) II Construct the set of head subterms for A 
B' = tt(B,Vj) II Construct the set of tail subterms for B 
C = A' .B' II Perform the set concatenation operation on A 1 and B' 
R = R + RE(C) II Add the terms in C as product terms to R 
Algorithm 1: Arc Insertion 

As implied by Algorithm [TJ in the insertion of the arc (vi,Vj), the number of new product terms to be 
added to the RE is given by \A'\ \B'\, where \A'\ is the number of (distinct) head subterms leading to the 
node Vi from the start nodes and \B'\ is the number of (distinct) tail subterms leading to the finish nodes 
from the node Vj. 

Algorithm [TJ is terminating, since all the subroutines are executed in finite time. Furthermore, a 
straightforward calculation using the values in Table [JJ shows that Algorithm [JJ has the worst case time 
complexity 0(|P| 4 |p|). It is possible to reduce this complexity value to 0(|P| 2 |p|) by performing the set 
concatenation without filtering the duplicate product terms while constructing the set C in 0(|P| 2 |p|) 
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time. These duplicate terms can be left out during the set union operation without affecting its worst case 
time complexity. 

3.1.2 Arc omission 

Omission of an arc may leave some nodes with no ingoing and/or outgoing edges. These nodes are con- 
sidered as valid start and/or finish nodes respectively, because the succeeding operations may introduce 
new edges to such nodes. Thus, Algorithm [2] updates the given corresponding RE after the omission of 
the arc (vi,Vj), where Vi,Vj € V, from the DG. 

Input: R - a RE in SOPF, 

(vi,Vj), where Vi,Vj £ V - arc to be omitted 
Output: R - RE is updated with the omission of the arc (vi , Vj ) in SOPF 

A = pt(R, Vi) II Find the set of product terms containing Vi 
B = pt(R, Vj) II Find the set of product terms containing Vj 
C = pt(R, ViVj ) II Find the set of product terms containing ViVj 
A' = % 

if A = C then 

A' = ht(A,Vi) II Construct the set of head subterms for A 
endif 

B' = % 

if B = C then 

B' = tt(B, Vj) II Construct the set of tail subterms for B 
endif 

C = A' U B' II Union of A' and B' 

R = R — RE(C) II Remove the product terms in C from R 
R = R + RE(C) II Add the terms in C as product terms to R 
Algorithm 2: Arc Omission 

In Algorithm |2j the number of product terms to be added to and removed from the RE is given by 
\A'\ + \B'\ and |C|, respectively, where \A'\ is the number of (distinct) head subterms leading to the 
node vi from the start nodes, \B'\ is the number of (distinct) tail subterms leading to the finish nodes 
from the node Vj and \C\ is the number of (distinct) product terms containing the sequence ViVj. 

The worst case time complexity of Algorithm [2] is 0(|P| 2 |p|) (see Table Q] for the complexity of 
auxiliary algorithms), and it runs in finite time. 

3.2 Node operators 

As a next step, omission and insertion operations are applied to nodes. 
3.2.1 Node insertion 

Node insertion is a higher level operation when compared to arc manipulation operations, because it 
generally requires connecting the node to the remaining nodes. To do this, first, the inserted node is 
considered as a valid start and finish node. Later, the following arc insertions take place. Accordingly, 
Algorithm [3] can be applied to update the corresponding RE with the insertion of the node Vi together 
with the arcs (vi,Xj) and (yk,Vi), where Vi £ V and Xj,yk 6 V for j = 1, . . . ,s and k = 1, . . . ,t, to the 
DG. 
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Input: R - a RE in SOPF, 

i'i ^ V - node to be inserted 

(vi ,Xj), where Xj € V, j = 1 , . . . , s - outgoing arcs to be inserted 
(jJk,Vi), where y^ £ V, k = 1, . . . ,t - ingoing arcs to be inserted 
Output: R - RE is updated with the insertion of the node Vi in SOPF 

R = R + Vi II Add the symbol Vi as a product term to R 
for each (vi , Xj ) do 

insert the arc (vi , Xj ) and update R II See AlgorithmQ] 
endfor 

for each (jJk,Vi) do 

insert the arc (yk,Vi) and update R/l See AlgorithmQ] 
endfor 

if s > 1 or t > 1 then 

R = R — Vi II Remove the product term u, from R 
endif 

Algorithm 3: Node Insertion 

It is straightforward to note that, given the set union and arc insertion operations run in finite time, 
Algorithm [3] runs in finite time, and it has 0((s + t)(\P\ 2 \p\)) worst case time complexity where s and t 
are the number of ingoing and outgoing arcs to be inserted, respectively. Note that, in a DG with no 
cycles, (s + 1) < n always holds and \p\ can be chosen to be n. 

3.2.2 Node omission 

Node omission entails the deletion of the node and the arcs related to it, and therefore is also a higher 
level operation with respect to the arc manipulation operations. For omission of a node, the node is 
disconnected from the rest of the graph and considered as a valid start and finish node, and later removed. 
Algorithm [4] shows the steps to update the corresponding RE with the omission of the node Vi (and all 
the arcs (vi,Xj) and (yk,Vi), where Xj,yk € V) from the DG. 

Input: R - a RE in SOPF, 

Vi £ V - node to be omitted 

(vi ,Xj), where Xj E V, j = 1 , . . . , s - outgoing arcs to be omitted 
(yk,Vi), where yk € V, k — 1, . . . ,t - ingoing arcs to be omitted 
Output: R - RE is updated with the omission of the node Vi in SOPF 

for each (vi , xj ) do 

omit the arc (vi,Xj) and update R II See Algorithmic 
endfor 

for each {y kl Vi) do 

omit the arc (yk,Vi) and update R II See Algorithmic 
endfor 

R = R— Vi II Remove the product term Vi from R 

Algorithm 4: Node Omission 

In Algorithm HJ the operations arc omission and set difference takes finite number of steps to com- 
plete. Furthermore, since the DG has no cycles, the loops are executed at most n times. Thus, the 
algorithm runs in finite time. In addition, in the worst case, running time complexity of Algorithm [4] is 
0(/c|P| 2 |p|) where k is the total number of arcs to be omitted. Also, in a DG with no cycles, k < n 
always holds and choosing \p\ = n is valid. 
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4 Conclusion and future work 

This paper considers the effects of basic DG manipulations on the corresponding RE and outlines algo- 
rithms in order to transform the RE accordingly, where DG contains no cycles. Hence, it is an initial step 
to lay out the correspondence between DG and RE mutations from a practical point of view. Some of the 
main implications of the study, so far, can be summarized as below in two parts: 

(i) Format of the RE: The size of a RE can be defined as its length, i. e., the total number of symbols 
and operators in the RE, and is determined by its format. The size, thus the format, of the RE has a direct 
effect on the efficiency of the operations. Unfortunately, SOPF is a kind of "worst-case" format where 
the compactness is not a concern. However, it helps to keep the algorithms straightforward and simple, 
and it seems easier to conserve since no additional transformations are required to preserve the format 
of the RE. Nevertheless, the derived complexity values should be interpreted as the "worst" of the worst 
case time complexity values (keeping in mind that this does not always lead to worst performance in 
practice). 

(ii) Extent of the approach: The DG in our present paper are assumed to be free of cycles, but this does 
not necessarily mean that the DG models which contain cycles are completely out of the scope. One can 
apply different cycle omission strategies, such as traveling cycles at most a predefined number of times, 
in combination with the underlying semantics of the system and the indexing mechanism to update or 
"flatten" the DG model. Inevitably, the resulting model is only a submodel, however, in practice, there 
might cases where it is preferable. 

On the other hand, our future work will include DG with cycles and enhance the format of the RE 
without sacrificing the (practical) efficiency which might stem from possible additional transformations. 
It is one of our concerns to improve the compactness of the RE by keeping it in another format (like 
perhaps product of sums format (POSF), which seems to be somehow more promising, etc.). However, 
it would be better and nicer to develop an approach which handles the manipulation operators in an 
algebraic manner without any respect to the format of the RE. 
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Appendix. Some auxiliary functions and their complexity 

Worst case time complexity values of some related auxiliary functions are given in Table [T] In order to 
interpret the complexity values correctly, note following: 

• \P\ is an upper bound on the number of product terms in P, i. e., the number of product terms in 
P, and \p\ is an upper bound on lengths of the product terms in P, i. e., the length of the longest 
product term in P. 

• \p'\ is the length of the product term p'. 

• \s\ is the length of the string s. 

Note that the sets A, B and C, and the RE R are also sets of product terms. 



Table 1: Worst Case Time Complexity Values for Some Auxiliary Functions 



Function 


Complexity 


Removal of a product term from a set: P = P — p' 


0(\P\(\p\ + \p'\)) 


Addition of a product term to a set: P = P + p' 


0(\P\(\p\ + \p>\)) 


Set Union: C = AU B 


0(|A||B|(|o| + |6|)) 


Set Concatenation: C = A.B 


0((|^||S|) 2 (|a| + |6|)) 


Extraction of Tail Product Terms: tt(P,s) where \s\ = 1,2 


o(\p\ 2 \p\) 


Extraction of Head Product Terms: ht(P,s) where \s\ = 1,2 


o(\p\ 2 \p\) 


Extraction of Product Terms: pt(R,s) where \s\ = 1,2 


o(\p\ 2 \p\) 



